Overview

Dataset statistics

Number of variables17
Number of observations62642
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.2 MiB
Average record size in memory103.2 B

Variable types

DateTime1
Categorical8
Numeric8

Alerts

company has a high cardinality: 1871 distinct values High cardinality
level has a high cardinality: 3074 distinct values High cardinality
location has a high cardinality: 1050 distinct values High cardinality
tag has a high cardinality: 3276 distinct values High cardinality
otherdetails has a high cardinality: 40133 distinct values High cardinality
dmaid has a high cardinality: 150 distinct values High cardinality
totalyearlycompensation is highly correlated with basesalary and 1 other fieldsHigh correlation
basesalary is highly correlated with totalyearlycompensationHigh correlation
stockgrantvalue is highly correlated with totalyearlycompensationHigh correlation
totalyearlycompensation is highly correlated with basesalary and 2 other fieldsHigh correlation
basesalary is highly correlated with totalyearlycompensationHigh correlation
stockgrantvalue is highly correlated with totalyearlycompensationHigh correlation
bonus is highly correlated with totalyearlycompensationHigh correlation
totalyearlycompensation is highly correlated with basesalary and 1 other fieldsHigh correlation
basesalary is highly correlated with totalyearlycompensationHigh correlation
stockgrantvalue is highly correlated with totalyearlycompensationHigh correlation
totalyearlycompensation is highly correlated with basesalary and 1 other fieldsHigh correlation
yearsofexperience is highly correlated with yearsatcompanyHigh correlation
yearsatcompany is highly correlated with yearsofexperienceHigh correlation
basesalary is highly correlated with totalyearlycompensationHigh correlation
stockgrantvalue is highly correlated with totalyearlycompensationHigh correlation
totalyearlycompensation is highly skewed (γ1 = 32.04993004) Skewed
basesalary is highly skewed (γ1 = 30.92064431) Skewed
stockgrantvalue is highly skewed (γ1 = 64.07966542) Skewed
bonus is highly skewed (γ1 = 36.90025253) Skewed
rowNumber has unique values Unique
totalyearlycompensation has 2297 (3.7%) zeros Zeros
yearsofexperience has 4614 (7.4%) zeros Zeros
yearsatcompany has 16203 (25.9%) zeros Zeros
basesalary has 2304 (3.7%) zeros Zeros
stockgrantvalue has 17181 (27.4%) zeros Zeros
bonus has 15427 (24.6%) zeros Zeros

Reproduction

Analysis started2022-05-06 00:37:03.674282
Analysis finished2022-05-06 00:38:44.162492
Duration1 minute and 40.49 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Distinct62561
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size489.5 KiB
Minimum2017-06-07 11:33:27
Maximum2021-08-17 08:28:57
2022-05-06T08:38:44.245775image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:44.360347image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

company
Categorical

HIGH CARDINALITY

Distinct1871
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size201.6 KiB
Amazon
8054 
Microsoft
5162 
Google
4313 
Facebook
 
2962
Apple
 
2015
Other values (1866)
40136 

Length

Max length39
Median length31
Mean length7.653650905
Min length2

Characters and Unicode

Total characters479440
Distinct characters71
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique518 ?
Unique (%)0.8%

Sample

1st rowOracle
2nd roweBay
3rd rowAmazon
4th rowMicrosoft
5th rowAmazon

Common Values

ValueCountFrequency (%)
Amazon8054
 
12.9%
Microsoft5162
 
8.2%
Google4313
 
6.9%
Facebook2962
 
4.7%
Apple2015
 
3.2%
Oracle1122
 
1.8%
Salesforce1042
 
1.7%
Intel941
 
1.5%
Cisco901
 
1.4%
IBM900
 
1.4%
Other values (1861)35230
56.2%

Length

2022-05-06T08:38:44.485343image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
amazon8225
 
11.4%
microsoft5252
 
7.3%
google4368
 
6.1%
facebook3004
 
4.2%
apple2057
 
2.9%
oracle1143
 
1.6%
salesforce1065
 
1.5%
intel992
 
1.4%
cisco959
 
1.3%
ibm927
 
1.3%
Other values (1187)44157
61.2%

Most occurring characters

ValueCountFrequency (%)
o54590
 
11.4%
e38755
 
8.1%
a37999
 
7.9%
n25341
 
5.3%
r24063
 
5.0%
i23604
 
4.9%
l22595
 
4.7%
t22546
 
4.7%
c19438
 
4.1%
s18439
 
3.8%
Other values (61)192070
40.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter383906
80.1%
Uppercase Letter84399
 
17.6%
Space Separator10111
 
2.1%
Other Punctuation524
 
0.1%
Decimal Number300
 
0.1%
Dash Punctuation184
 
< 0.1%
Open Punctuation8
 
< 0.1%
Close Punctuation8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o54590
14.2%
e38755
10.1%
a37999
 
9.9%
n25341
 
6.6%
r24063
 
6.3%
i23604
 
6.1%
l22595
 
5.9%
t22546
 
5.9%
c19438
 
5.1%
s18439
 
4.8%
Other values (16)96536
25.1%
Uppercase Letter
ValueCountFrequency (%)
A14920
17.7%
M10004
11.9%
S6729
 
8.0%
G6420
 
7.6%
C5359
 
6.3%
I4392
 
5.2%
F3890
 
4.6%
B3807
 
4.5%
P3435
 
4.1%
T3022
 
3.6%
Other values (16)22421
26.6%
Decimal Number
ValueCountFrequency (%)
388
29.3%
152
17.3%
545
15.0%
241
13.7%
722
 
7.3%
620
 
6.7%
018
 
6.0%
814
 
4.7%
Other Punctuation
ValueCountFrequency (%)
.295
56.3%
&141
26.9%
'73
 
13.9%
,5
 
1.0%
*5
 
1.0%
/5
 
1.0%
Space Separator
ValueCountFrequency (%)
10110
> 99.9%
 1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
-184
100.0%
Open Punctuation
ValueCountFrequency (%)
(8
100.0%
Close Punctuation
ValueCountFrequency (%)
)8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin468305
97.7%
Common11135
 
2.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o54590
 
11.7%
e38755
 
8.3%
a37999
 
8.1%
n25341
 
5.4%
r24063
 
5.1%
i23604
 
5.0%
l22595
 
4.8%
t22546
 
4.8%
c19438
 
4.2%
s18439
 
3.9%
Other values (42)180935
38.6%
Common
ValueCountFrequency (%)
10110
90.8%
.295
 
2.6%
-184
 
1.7%
&141
 
1.3%
388
 
0.8%
'73
 
0.7%
152
 
0.5%
545
 
0.4%
241
 
0.4%
722
 
0.2%
Other values (9)84
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII479439
> 99.9%
None1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o54590
 
11.4%
e38755
 
8.1%
a37999
 
7.9%
n25341
 
5.3%
r24063
 
5.0%
i23604
 
4.9%
l22595
 
4.7%
t22546
 
4.7%
c19438
 
4.1%
s18439
 
3.8%
Other values (60)192069
40.1%
None
ValueCountFrequency (%)
 1
100.0%

level
Categorical

HIGH CARDINALITY

Distinct3074
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Memory size211.0 KiB
L4
5008 
L5
4861 
L3
 
3331
L6
 
2866
Senior Software Engineer
 
1433
Other values (3069)
45143 

Length

Max length59
Median length54
Mean length5.904808276
Min length1

Characters and Unicode

Total characters369889
Distinct characters101
Distinct categories13 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1798 ?
Unique (%)2.9%

Sample

1st rowL3
2nd rowSE 2
3rd rowL7
4th row64
5th rowL5

Common Values

ValueCountFrequency (%)
L45008
 
8.0%
L54861
 
7.8%
L33331
 
5.3%
L62866
 
4.6%
Senior Software Engineer1433
 
2.3%
L21162
 
1.9%
Senior1048
 
1.7%
L7916
 
1.5%
L1764
 
1.2%
62764
 
1.2%
Other values (3064)40489
64.6%

Length

2022-05-06T08:38:44.610351image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
engineer6267
 
7.3%
senior5702
 
6.6%
l45108
 
5.9%
l55035
 
5.8%
software4952
 
5.7%
l33419
 
4.0%
l63286
 
3.8%
sde1630
 
1.9%
staff1601
 
1.9%
associate1516
 
1.8%
Other values (1356)47657
55.3%

Most occurring characters

ValueCountFrequency (%)
e35149
 
9.5%
n25546
 
6.9%
r25426
 
6.9%
23967
 
6.5%
L21638
 
5.8%
i18781
 
5.1%
S17487
 
4.7%
a17118
 
4.6%
o15205
 
4.1%
t12983
 
3.5%
Other values (91)156589
42.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter198285
53.6%
Uppercase Letter92240
24.9%
Decimal Number53357
 
14.4%
Space Separator23967
 
6.5%
Dash Punctuation1117
 
0.3%
Other Punctuation410
 
0.1%
Close Punctuation248
 
0.1%
Open Punctuation248
 
0.1%
Math Symbol8
 
< 0.1%
Modifier Symbol4
 
< 0.1%
Other values (3)5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e35149
17.7%
n25546
12.9%
r25426
12.8%
i18781
9.5%
a17118
8.6%
o15205
7.7%
t12983
 
6.5%
g8860
 
4.5%
f8472
 
4.3%
s5691
 
2.9%
Other values (31)25054
12.6%
Uppercase Letter
ValueCountFrequency (%)
L21638
23.5%
S17487
19.0%
E12233
13.3%
I9539
10.3%
C5903
 
6.4%
T4971
 
5.4%
M4819
 
5.2%
P3523
 
3.8%
D3490
 
3.8%
A2801
 
3.0%
Other values (19)5836
 
6.3%
Decimal Number
ValueCountFrequency (%)
69750
18.3%
49612
18.0%
59207
17.3%
38257
15.5%
25770
10.8%
14804
9.0%
72135
 
4.0%
01636
 
3.1%
91165
 
2.2%
81021
 
1.9%
Other Punctuation
ValueCountFrequency (%)
/204
49.8%
.149
36.3%
,28
 
6.8%
&16
 
3.9%
?6
 
1.5%
#3
 
0.7%
:2
 
0.5%
;1
 
0.2%
\1
 
0.2%
Dash Punctuation
ValueCountFrequency (%)
-1116
99.9%
1
 
0.1%
Math Symbol
ValueCountFrequency (%)
|7
87.5%
+1
 
12.5%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
23967
100.0%
Close Punctuation
ValueCountFrequency (%)
)248
100.0%
Open Punctuation
ValueCountFrequency (%)
(248
100.0%
Modifier Symbol
ValueCountFrequency (%)
`4
100.0%
Currency Symbol
ValueCountFrequency (%)
$2
100.0%
Connector Punctuation
ValueCountFrequency (%)
_1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin290501
78.5%
Common79362
 
21.5%
Cyrillic24
 
< 0.1%
Han2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e35149
 
12.1%
n25546
 
8.8%
r25426
 
8.8%
L21638
 
7.4%
i18781
 
6.5%
S17487
 
6.0%
a17118
 
5.9%
o15205
 
5.2%
t12983
 
4.5%
E12233
 
4.2%
Other values (42)88935
30.6%
Common
ValueCountFrequency (%)
23967
30.2%
69750
12.3%
49612
12.1%
59207
 
11.6%
38257
 
10.4%
25770
 
7.3%
14804
 
6.1%
72135
 
2.7%
01636
 
2.1%
91165
 
1.5%
Other values (19)3059
 
3.9%
Cyrillic
ValueCountFrequency (%)
а2
 
8.3%
л2
 
8.3%
о2
 
8.3%
в2
 
8.3%
е2
 
8.3%
и2
 
8.3%
я1
 
4.2%
р1
 
4.2%
н1
 
4.2%
п1
 
4.2%
Other values (8)8
33.3%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII369862
> 99.9%
Cyrillic24
 
< 0.1%
CJK2
 
< 0.1%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e35149
 
9.5%
n25546
 
6.9%
r25426
 
6.9%
23967
 
6.5%
L21638
 
5.9%
i18781
 
5.1%
S17487
 
4.7%
a17118
 
4.6%
o15205
 
4.1%
t12983
 
3.5%
Other values (70)156562
42.3%
Cyrillic
ValueCountFrequency (%)
а2
 
8.3%
л2
 
8.3%
о2
 
8.3%
в2
 
8.3%
е2
 
8.3%
и2
 
8.3%
я1
 
4.2%
р1
 
4.2%
н1
 
4.2%
п1
 
4.2%
Other values (8)8
33.3%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%
Punctuation
ValueCountFrequency (%)
1
100.0%

title
Categorical

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size489.5 KiB
Software Engineer
41231 
Product Manager
4673 
Software Engineering Manager
 
3569
Data Scientist
 
2578
Hardware Engineer
 
2200
Other values (10)
8391 

Length

Max length28
Median length17
Mean length17.34033077
Min length5

Characters and Unicode

Total characters1086233
Distinct characters31
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowProduct Manager
2nd rowSoftware Engineer
3rd rowProduct Manager
4th rowSoftware Engineering Manager
5th rowSoftware Engineer

Common Values

ValueCountFrequency (%)
Software Engineer41231
65.8%
Product Manager4673
 
7.5%
Software Engineering Manager3569
 
5.7%
Data Scientist2578
 
4.1%
Hardware Engineer2200
 
3.5%
Product Designer1516
 
2.4%
Technical Program Manager1381
 
2.2%
Solution Architect1157
 
1.8%
Management Consultant976
 
1.6%
Business Analyst885
 
1.4%
Other values (5)2476
 
4.0%

Length

2022-05-06T08:38:44.739323image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
software44800
34.8%
engineer43921
34.2%
manager9623
 
7.5%
product6189
 
4.8%
engineering3569
 
2.8%
data2578
 
2.0%
scientist2578
 
2.0%
hardware2200
 
1.7%
designer1516
 
1.2%
technical1381
 
1.1%
Other values (13)10257
 
8.0%

Most occurring characters

ValueCountFrequency (%)
e165879
15.3%
n122042
11.2%
r119913
11.0%
a82692
 
7.6%
t67168
 
6.2%
65970
 
6.1%
g65265
 
6.0%
i63962
 
5.9%
o56024
 
5.2%
S48996
 
4.5%
Other values (21)228322
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter891651
82.1%
Uppercase Letter128612
 
11.8%
Space Separator65970
 
6.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e165879
18.6%
n122042
13.7%
r119913
13.4%
a82692
9.3%
t67168
7.5%
g65265
 
7.3%
i63962
 
7.2%
o56024
 
6.3%
w47000
 
5.3%
f44800
 
5.0%
Other values (9)56906
 
6.4%
Uppercase Letter
ValueCountFrequency (%)
S48996
38.1%
E47490
36.9%
M11799
 
9.2%
P7570
 
5.9%
D4094
 
3.2%
H2564
 
2.0%
A2042
 
1.6%
T1381
 
1.1%
C976
 
0.8%
B885
 
0.7%
Space Separator
ValueCountFrequency (%)
65970
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1020263
93.9%
Common65970
 
6.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e165879
16.3%
n122042
12.0%
r119913
11.8%
a82692
8.1%
t67168
 
6.6%
g65265
 
6.4%
i63962
 
6.3%
o56024
 
5.5%
S48996
 
4.8%
E47490
 
4.7%
Other values (20)180832
17.7%
Common
ValueCountFrequency (%)
65970
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1086233
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e165879
15.3%
n122042
11.2%
r119913
11.0%
a82692
 
7.6%
t67168
 
6.2%
65970
 
6.1%
g65265
 
6.0%
i63962
 
5.9%
o56024
 
5.2%
S48996
 
4.5%
Other values (21)228322
21.0%

totalyearlycompensation
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct1836
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3175.327065
Minimum0
Maximum3386013
Zeros2297
Zeros (%)3.7%
Negative0
Negative (%)0.0%
Memory size489.5 KiB
2022-05-06T08:38:44.828121image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile27
Q1116
median183
Q3269
95-th percentile507
Maximum3386013
Range3386013
Interquartile range (IQR)153

Descriptive statistics

Standard deviation38778.77419
Coefficient of variation (CV)12.2125291
Kurtosis1799.206627
Mean3175.327065
Median Absolute Deviation (MAD)74
Skewness32.04993004
Sum198908838
Variance1503793327
MonotonicityNot monotonic
2022-05-06T08:38:44.937499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02297
 
3.7%
157280
 
0.4%
129268
 
0.4%
137265
 
0.4%
148263
 
0.4%
141260
 
0.4%
143259
 
0.4%
187259
 
0.4%
151257
 
0.4%
192257
 
0.4%
Other values (1826)57977
92.6%
ValueCountFrequency (%)
02297
3.7%
32
 
< 0.1%
41
 
< 0.1%
53
 
< 0.1%
65
 
< 0.1%
712
 
< 0.1%
823
 
< 0.1%
922
 
< 0.1%
1032
 
0.1%
1118
 
< 0.1%
ValueCountFrequency (%)
33860131
< 0.1%
25933691
< 0.1%
21117441
< 0.1%
20577721
< 0.1%
19498651
< 0.1%
12428251
< 0.1%
12059041
< 0.1%
11621641
< 0.1%
11601321
< 0.1%
11152701
< 0.1%

location
Categorical

HIGH CARDINALITY

Distinct1050
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size163.0 KiB
Seattle, WA
8701 
San Francisco, CA
6797 
New York, NY
4562 
Redmond, WA
 
2649
Mountain View, CA
 
2275
Other values (1045)
37658 

Length

Max length41
Median length38
Mean length14.1890425
Min length7

Characters and Unicode

Total characters888830
Distinct characters59
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique402 ?
Unique (%)0.6%

Sample

1st rowRedwood City, CA
2nd rowSan Francisco, CA
3rd rowSeattle, WA
4th rowRedmond, WA
5th rowVancouver, BC, Canada

Common Values

ValueCountFrequency (%)
Seattle, WA8701
 
13.9%
San Francisco, CA6797
 
10.9%
New York, NY4562
 
7.3%
Redmond, WA2649
 
4.2%
Mountain View, CA2275
 
3.6%
Sunnyvale, CA2248
 
3.6%
San Jose, CA2047
 
3.3%
Austin, TX1527
 
2.4%
Menlo Park, CA1440
 
2.3%
Cupertino, CA1431
 
2.3%
Other values (1040)28965
46.2%

Length

2022-05-06T08:38:45.062526image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca22824
 
14.2%
wa12356
 
7.7%
san10169
 
6.3%
seattle8701
 
5.4%
francisco6823
 
4.2%
ny4715
 
2.9%
new4609
 
2.9%
york4572
 
2.8%
india2830
 
1.8%
tx2702
 
1.7%
Other values (1267)80556
50.1%

Most occurring characters

ValueCountFrequency (%)
98215
 
11.0%
,72468
 
8.2%
a69382
 
7.8%
n63337
 
7.1%
e55820
 
6.3%
o47779
 
5.4%
A46957
 
5.3%
t37857
 
4.3%
i33675
 
3.8%
C33441
 
3.8%
Other values (49)329899
37.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter494333
55.6%
Uppercase Letter223573
25.2%
Space Separator98215
 
11.0%
Other Punctuation72537
 
8.2%
Dash Punctuation110
 
< 0.1%
Open Punctuation31
 
< 0.1%
Close Punctuation31
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a69382
14.0%
n63337
12.8%
e55820
11.3%
o47779
9.7%
t37857
7.7%
i33675
 
6.8%
l30145
 
6.1%
r29899
 
6.0%
s20428
 
4.1%
d19875
 
4.0%
Other values (16)86136
17.4%
Uppercase Letter
ValueCountFrequency (%)
A46957
21.0%
C33441
15.0%
S25934
11.6%
N14617
 
6.5%
W13619
 
6.1%
Y9558
 
4.3%
M8742
 
3.9%
F7882
 
3.5%
B6524
 
2.9%
T6189
 
2.8%
Other values (16)50110
22.4%
Other Punctuation
ValueCountFrequency (%)
,72468
99.9%
.67
 
0.1%
'2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
98215
100.0%
Dash Punctuation
ValueCountFrequency (%)
-110
100.0%
Open Punctuation
ValueCountFrequency (%)
(31
100.0%
Close Punctuation
ValueCountFrequency (%)
)31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin717906
80.8%
Common170924
 
19.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a69382
 
9.7%
n63337
 
8.8%
e55820
 
7.8%
o47779
 
6.7%
A46957
 
6.5%
t37857
 
5.3%
i33675
 
4.7%
C33441
 
4.7%
l30145
 
4.2%
r29899
 
4.2%
Other values (42)269614
37.6%
Common
ValueCountFrequency (%)
98215
57.5%
,72468
42.4%
-110
 
0.1%
.67
 
< 0.1%
(31
 
< 0.1%
)31
 
< 0.1%
'2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII888830
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
98215
 
11.0%
,72468
 
8.2%
a69382
 
7.8%
n63337
 
7.1%
e55820
 
6.3%
o47779
 
5.4%
A46957
 
5.3%
t37857
 
4.3%
i33675
 
3.8%
C33441
 
3.8%
Other values (49)329899
37.1%

yearsofexperience
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct56
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.191053925
Minimum0
Maximum90
Zeros4614
Zeros (%)7.4%
Negative0
Negative (%)0.0%
Memory size489.5 KiB
2022-05-06T08:38:45.182163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median6
Q310
95-th percentile20
Maximum90
Range90
Interquartile range (IQR)7

Descriptive statistics

Standard deviation6.437148272
Coefficient of variation (CV)0.8951606174
Kurtosis3.40735208
Mean7.191053925
Median Absolute Deviation (MAD)4
Skewness1.536440913
Sum450462
Variance41.43687787
MonotonicityNot monotonic
2022-05-06T08:38:45.279989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36413
 
10.2%
15937
 
9.5%
45430
 
8.7%
24683
 
7.5%
04614
 
7.4%
64555
 
7.3%
54236
 
6.8%
73768
 
6.0%
92918
 
4.7%
82796
 
4.5%
Other values (46)17292
27.6%
ValueCountFrequency (%)
04614
7.4%
15937
9.5%
24683
7.5%
36413
10.2%
45430
8.7%
54236
6.8%
64555
7.3%
73768
6.0%
82796
4.5%
92918
4.7%
ValueCountFrequency (%)
901
 
< 0.1%
581
 
< 0.1%
573
< 0.1%
532
 
< 0.1%
512
 
< 0.1%
502
 
< 0.1%
493
< 0.1%
485
< 0.1%
472
 
< 0.1%
461
 
< 0.1%

yearsatcompany
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct44
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.7006162
Minimum0
Maximum90
Zeros16203
Zeros (%)25.9%
Negative0
Negative (%)0.0%
Memory size489.5 KiB
2022-05-06T08:38:45.380256image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile9
Maximum90
Range90
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.524030209
Coefficient of variation (CV)1.304898567
Kurtosis17.45339846
Mean2.7006162
Median Absolute Deviation (MAD)1
Skewness2.973821206
Sum169172
Variance12.41878891
MonotonicityNot monotonic
2022-05-06T08:38:45.489644image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
016203
25.9%
115545
24.8%
37821
12.5%
26885
11.0%
44667
 
7.5%
62549
 
4.1%
52467
 
3.9%
71677
 
2.7%
9931
 
1.5%
8893
 
1.4%
Other values (34)3004
 
4.8%
ValueCountFrequency (%)
016203
25.9%
115545
24.8%
26885
11.0%
37821
12.5%
44667
 
7.5%
52467
 
3.9%
62549
 
4.1%
71677
 
2.7%
8893
 
1.4%
9931
 
1.5%
ValueCountFrequency (%)
901
< 0.1%
471
< 0.1%
431
< 0.1%
421
< 0.1%
391
< 0.1%
381
< 0.1%
372
< 0.1%
362
< 0.1%
352
< 0.1%
341
< 0.1%

tag
Categorical

HIGH CARDINALITY

Distinct3276
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Memory size489.5 KiB
Full Stack
11382 
Distributed Systems (Back-End)
10838 
API Development (Back-End)
6277 
ML / AI
4204 
Web Development (Front-End)
2971 
Other values (3271)
26970 

Length

Max length84
Median length59
Mean length15.5467418
Min length0

Characters and Unicode

Total characters973879
Distinct characters80
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2378 ?
Unique (%)3.8%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
Full Stack11382
18.2%
Distributed Systems (Back-End)10838
17.3%
API Development (Back-End)6277
 
10.0%
ML / AI4204
 
6.7%
Web Development (Front-End)2971
 
4.7%
Product1791
 
2.9%
Data1580
 
2.5%
DevOps1573
 
2.5%
Security1197
 
1.9%
Networking1178
 
1.9%
Other values (3266)19651
31.4%

Length

2022-05-06T08:38:45.614651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
back-end17131
12.8%
full11388
 
8.5%
stack11388
 
8.5%
systems11130
 
8.3%
distributed10840
 
8.1%
development9426
 
7.1%
api6281
 
4.7%
5270
 
3.9%
ml4275
 
3.2%
ai4238
 
3.2%
Other values (1459)42121
31.6%

Most occurring characters

ValueCountFrequency (%)
e77924
 
8.0%
72275
 
7.4%
t72224
 
7.4%
n48374
 
5.0%
i43041
 
4.4%
a42330
 
4.3%
s42010
 
4.3%
d40490
 
4.2%
l39734
 
4.1%
c38353
 
3.9%
Other values (70)457124
46.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter649243
66.7%
Uppercase Letter179849
 
18.5%
Space Separator72276
 
7.4%
Open Punctuation23264
 
2.4%
Close Punctuation23263
 
2.4%
Dash Punctuation20162
 
2.1%
Other Punctuation4758
 
0.5%
Math Symbol990
 
0.1%
Decimal Number74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e77924
12.0%
t72224
11.1%
n48374
 
7.5%
i43041
 
6.6%
a42330
 
6.5%
s42010
 
6.5%
d40490
 
6.2%
l39734
 
6.1%
c38353
 
5.9%
r30735
 
4.7%
Other values (17)174028
26.8%
Uppercase Letter
ValueCountFrequency (%)
S30330
16.9%
D25856
14.4%
E23818
13.2%
B17496
9.7%
F14724
8.2%
A14513
8.1%
I11653
 
6.5%
P9705
 
5.4%
M6255
 
3.5%
L4581
 
2.5%
Other values (17)20918
11.6%
Other Punctuation
ValueCountFrequency (%)
/4449
93.5%
,160
 
3.4%
&119
 
2.5%
.14
 
0.3%
;5
 
0.1%
?4
 
0.1%
#2
 
< 0.1%
@2
 
< 0.1%
'2
 
< 0.1%
:1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
521
28.4%
218
24.3%
310
13.5%
69
12.2%
17
 
9.5%
44
 
5.4%
04
 
5.4%
81
 
1.4%
Math Symbol
ValueCountFrequency (%)
+987
99.7%
>2
 
0.2%
|1
 
0.1%
Space Separator
ValueCountFrequency (%)
72275
> 99.9%
1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
(23264
100.0%
Close Punctuation
ValueCountFrequency (%)
)23263
100.0%
Dash Punctuation
ValueCountFrequency (%)
-20162
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin829092
85.1%
Common144787
 
14.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e77924
 
9.4%
t72224
 
8.7%
n48374
 
5.8%
i43041
 
5.2%
a42330
 
5.1%
s42010
 
5.1%
d40490
 
4.9%
l39734
 
4.8%
c38353
 
4.6%
r30735
 
3.7%
Other values (44)353877
42.7%
Common
ValueCountFrequency (%)
72275
49.9%
(23264
 
16.1%
)23263
 
16.1%
-20162
 
13.9%
/4449
 
3.1%
+987
 
0.7%
,160
 
0.1%
&119
 
0.1%
521
 
< 0.1%
218
 
< 0.1%
Other values (16)69
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII973876
> 99.9%
None2
 
< 0.1%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e77924
 
8.0%
72275
 
7.4%
t72224
 
7.4%
n48374
 
5.0%
i43041
 
4.4%
a42330
 
4.3%
s42010
 
4.3%
d40490
 
4.2%
l39734
 
4.1%
c38353
 
3.9%
Other values (67)457121
46.9%
None
ValueCountFrequency (%)
ł1
50.0%
È1
50.0%
Punctuation
ValueCountFrequency (%)
1
100.0%

basesalary
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct1263
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1798.144663
Minimum0
Maximum2057772
Zeros2304
Zeros (%)3.7%
Negative0
Negative (%)0.0%
Memory size489.5 KiB
2022-05-06T08:38:45.739660image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile21
Q188
median131
Q3181
95-th percentile270
Maximum2057772
Range2057772
Interquartile range (IQR)93

Descriptive statistics

Standard deviation19271.93695
Coefficient of variation (CV)10.71767881
Kurtosis2355.55225
Mean1798.144663
Median Absolute Deviation (MAD)46
Skewness30.92064431
Sum112639378
Variance371407553.9
MonotonicityNot monotonic
2022-05-06T08:38:45.856763image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02304
 
3.7%
105437
 
0.7%
112407
 
0.6%
115404
 
0.6%
111398
 
0.6%
94391
 
0.6%
95390
 
0.6%
96390
 
0.6%
104388
 
0.6%
117387
 
0.6%
Other values (1253)56746
90.6%
ValueCountFrequency (%)
02304
3.7%
11
 
< 0.1%
36
 
< 0.1%
42
 
< 0.1%
54
 
< 0.1%
617
 
< 0.1%
715
 
< 0.1%
824
 
< 0.1%
931
 
< 0.1%
1049
 
0.1%
ValueCountFrequency (%)
20577721
< 0.1%
10983101
< 0.1%
5607401
< 0.1%
5564161
< 0.1%
5377321
< 0.1%
5354621
< 0.1%
4518831
< 0.1%
3688291
< 0.1%
3586891
< 0.1%
3570781
< 0.1%

stockgrantvalue
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct1281
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1134.865202
Minimum0
Maximum3140803
Zeros17181
Zeros (%)27.4%
Negative0
Negative (%)0.0%
Memory size489.5 KiB
2022-05-06T08:38:45.956844image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median23
Q364
95-th percentile226
Maximum3140803
Range3140803
Interquartile range (IQR)64

Descriptive statistics

Standard deviation25206.11464
Coefficient of variation (CV)22.21066836
Kurtosis5985.821437
Mean1134.865202
Median Absolute Deviation (MAD)23
Skewness64.07966542
Sum71090226
Variance635348215.1
MonotonicityNot monotonic
2022-05-06T08:38:46.066224image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
017181
27.4%
6759
 
1.2%
15731
 
1.2%
4722
 
1.2%
3715
 
1.1%
13702
 
1.1%
10701
 
1.1%
14700
 
1.1%
7687
 
1.1%
5677
 
1.1%
Other values (1271)39067
62.4%
ValueCountFrequency (%)
017181
27.4%
1501
 
0.8%
2453
 
0.7%
3715
 
1.1%
4722
 
1.2%
5677
 
1.1%
6759
 
1.2%
7687
 
1.1%
8567
 
0.9%
9665
 
1.1%
ValueCountFrequency (%)
31408031
< 0.1%
22967281
< 0.1%
18604581
< 0.1%
15810361
< 0.1%
10760851
< 0.1%
10553591
< 0.1%
9987801
< 0.1%
9524891
< 0.1%
9078081
< 0.1%
8725141
< 0.1%

bonus
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct812
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean242.3171993
Minimum0
Maximum384026
Zeros15427
Zeros (%)24.6%
Negative0
Negative (%)0.0%
Memory size489.5 KiB
2022-05-06T08:38:46.191234image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median13
Q327
95-th percentile67
Maximum384026
Range384026
Interquartile range (IQR)26

Descriptive statistics

Standard deviation3709.187697
Coefficient of variation (CV)15.30715817
Kurtosis2422.19552
Mean242.3171993
Median Absolute Deviation (MAD)13
Skewness36.90025253
Sum15179234
Variance13758073.37
MonotonicityNot monotonic
2022-05-06T08:38:46.309634image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
015427
24.6%
151589
 
2.5%
101532
 
2.4%
121515
 
2.4%
141492
 
2.4%
71487
 
2.4%
131483
 
2.4%
111480
 
2.4%
91461
 
2.3%
61456
 
2.3%
Other values (802)33720
53.8%
ValueCountFrequency (%)
015427
24.6%
1864
 
1.4%
2748
 
1.2%
31253
 
2.0%
41123
 
1.8%
51301
 
2.1%
61456
 
2.3%
71487
 
2.4%
81316
 
2.1%
91461
 
2.3%
ValueCountFrequency (%)
3840261
< 0.1%
1807041
< 0.1%
1589661
< 0.1%
1584581
< 0.1%
1536491
< 0.1%
1515401
< 0.1%
1473081
< 0.1%
1407521
< 0.1%
1307131
< 0.1%
1166381
< 0.1%

gender
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size61.5 KiB
Male
35702 
19540 
Female
6999 
Other
 
400
Title: Senior Software Engineer
 
1

Length

Max length31
Median length4
Mean length2.982551643
Min length0

Characters and Unicode

Total characters186833
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
Male35702
57.0%
19540
31.2%
Female6999
 
11.2%
Other400
 
0.6%
Title: Senior Software Engineer1
 
< 0.1%

Length

2022-05-06T08:38:46.409328image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-06T08:38:46.503083image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
male35702
82.8%
female6999
 
16.2%
other400
 
0.9%
title1
 
< 0.1%
senior1
 
< 0.1%
software1
 
< 0.1%
engineer1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e50105
26.8%
l42702
22.9%
a42702
22.9%
M35702
19.1%
F6999
 
3.7%
m6999
 
3.7%
r403
 
0.2%
t402
 
0.2%
h400
 
0.2%
O400
 
0.2%
Other values (11)19
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter143724
76.9%
Uppercase Letter43105
 
23.1%
Space Separator3
 
< 0.1%
Other Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e50105
34.9%
l42702
29.7%
a42702
29.7%
m6999
 
4.9%
r403
 
0.3%
t402
 
0.3%
h400
 
0.3%
i3
 
< 0.1%
n3
 
< 0.1%
o2
 
< 0.1%
Other values (3)3
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
M35702
82.8%
F6999
 
16.2%
O400
 
0.9%
S2
 
< 0.1%
E1
 
< 0.1%
T1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
3
100.0%
Other Punctuation
ValueCountFrequency (%)
:1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin186829
> 99.9%
Common4
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e50105
26.8%
l42702
22.9%
a42702
22.9%
M35702
19.1%
F6999
 
3.7%
m6999
 
3.7%
r403
 
0.2%
t402
 
0.2%
h400
 
0.2%
O400
 
0.2%
Other values (9)15
 
< 0.1%
Common
ValueCountFrequency (%)
3
75.0%
:1
 
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII186833
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e50105
26.8%
l42702
22.9%
a42702
22.9%
M35702
19.1%
F6999
 
3.7%
m6999
 
3.7%
r403
 
0.2%
t402
 
0.2%
h400
 
0.2%
O400
 
0.2%
Other values (11)19
 
< 0.1%

otherdetails
Categorical

HIGH CARDINALITY

Distinct40133
Distinct (%)64.1%
Missing0
Missing (%)0.0%
Memory size489.5 KiB
22503 
freaun
 
2
braiock
 
2
schnuolls
 
2
twauesly
 
2
Other values (40128)
40131 

Length

Max length130
Median length109
Mean length37.61950449
Min length0

Characters and Unicode

Total characters2356561
Distinct characters27
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40125 ?
Unique (%)64.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
22503
35.9%
freaun2
 
< 0.1%
braiock2
 
< 0.1%
schnuolls2
 
< 0.1%
twauesly2
 
< 0.1%
whuott2
 
< 0.1%
noilly2
 
< 0.1%
cyclieuns2
 
< 0.1%
glaarn qef spraak pleosk chielt shruith thiudly mcgian srieuk sziogh keedy1
 
< 0.1%
mccoomp physieusts gritt kwiads froary swiesm kleirst daap hypeetch dynaiarty tadly1
 
< 0.1%
Other values (40123)40123
64.1%

Length

2022-05-06T08:38:46.612466image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
prioucts8
 
< 0.1%
mcnaiards8
 
< 0.1%
theauny8
 
< 0.1%
hyiump7
 
< 0.1%
mcfaiats7
 
< 0.1%
bleady7
 
< 0.1%
pliurds7
 
< 0.1%
meesly7
 
< 0.1%
vuids6
 
< 0.1%
czihn6
 
< 0.1%
Other values (219162)321111
> 99.9%

Most occurring characters

ValueCountFrequency (%)
281043
 
11.9%
s204363
 
8.7%
i157021
 
6.7%
a156293
 
6.6%
c135613
 
5.8%
r134123
 
5.7%
e131501
 
5.6%
o123409
 
5.2%
u115832
 
4.9%
h113294
 
4.8%
Other values (17)804069
34.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2075518
88.1%
Space Separator281043
 
11.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s204363
 
9.8%
i157021
 
7.6%
a156293
 
7.5%
c135613
 
6.5%
r134123
 
6.5%
e131501
 
6.3%
o123409
 
5.9%
u115832
 
5.6%
h113294
 
5.5%
t113211
 
5.5%
Other values (16)690858
33.3%
Space Separator
ValueCountFrequency (%)
281043
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2075518
88.1%
Common281043
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
s204363
 
9.8%
i157021
 
7.6%
a156293
 
7.5%
c135613
 
6.5%
r134123
 
6.5%
e131501
 
6.3%
o123409
 
5.9%
u115832
 
5.6%
h113294
 
5.5%
t113211
 
5.5%
Other values (16)690858
33.3%
Common
ValueCountFrequency (%)
281043
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2356561
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
281043
 
11.9%
s204363
 
8.7%
i157021
 
6.7%
a156293
 
6.6%
c135613
 
5.8%
r134123
 
5.7%
e131501
 
5.6%
o123409
 
5.2%
u115832
 
4.9%
h113294
 
4.8%
Other values (17)804069
34.1%

cityid
Real number (ℝ≥0)

Distinct1045
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9856.201989
Minimum0
Maximum47926
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size489.5 KiB
2022-05-06T08:38:46.737475image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3663
Q17369
median7839
Q311521
95-th percentile15555
Maximum47926
Range47926
Interquartile range (IQR)4152

Descriptive statistics

Standard deviation6679.104563
Coefficient of variation (CV)0.6776550004
Kurtosis16.79525351
Mean9856.201989
Median Absolute Deviation (MAD)2343
Skewness3.809126395
Sum617412205
Variance44610437.76
MonotonicityNot monotonic
2022-05-06T08:38:46.846859image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
115278701
 
13.9%
74196796
 
10.8%
101824562
 
7.3%
115212649
 
4.2%
73222275
 
3.6%
74722246
 
3.6%
74222047
 
3.3%
109651527
 
2.4%
73001440
 
2.3%
71581431
 
2.3%
Other values (1035)28968
46.2%
ValueCountFrequency (%)
02
 
< 0.1%
101
 
< 0.1%
11535
 
< 0.1%
1180101
 
0.2%
1182124
 
0.2%
12059
 
< 0.1%
1206742
1.2%
12118
 
< 0.1%
12213
 
< 0.1%
12228
 
< 0.1%
ValueCountFrequency (%)
47926406
0.6%
479136
 
< 0.1%
476182
 
< 0.1%
429095
 
< 0.1%
4290310
 
< 0.1%
428964
 
< 0.1%
4287710
 
< 0.1%
427613
 
< 0.1%
4263188
 
0.1%
426241
 
< 0.1%

dmaid
Categorical

HIGH CARDINALITY

Distinct150
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size127.7 KiB
807
20400 
819
12343 
0
9826 
501
5156 
506
 
1773
Other values (145)
13144 

Length

Max length3
Median length3
Mean length2.686184988
Min length0

Characters and Unicode

Total characters168268
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)< 0.1%

Sample

1st row807
2nd row807
3rd row819
4th row819
5th row0

Common Values

ValueCountFrequency (%)
80720400
32.6%
81912343
19.7%
09826
15.7%
5015156
 
8.2%
5061773
 
2.8%
6351556
 
2.5%
5111374
 
2.2%
8031342
 
2.1%
825854
 
1.4%
602843
 
1.3%
Other values (140)7175
 
11.5%

Length

2022-05-06T08:38:46.940614image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
80720400
32.6%
81912343
19.7%
09826
15.7%
5015156
 
8.2%
5061773
 
2.8%
6351556
 
2.5%
5111374
 
2.2%
8031342
 
2.1%
825854
 
1.4%
602843
 
1.3%
Other values (139)7173
 
11.5%

Most occurring characters

ValueCountFrequency (%)
041568
24.7%
836508
21.7%
722393
13.3%
122051
13.1%
515336
 
9.1%
912800
 
7.6%
67274
 
4.3%
34777
 
2.8%
24338
 
2.6%
41223
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number168268
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
041568
24.7%
836508
21.7%
722393
13.3%
122051
13.1%
515336
 
9.1%
912800
 
7.6%
67274
 
4.3%
34777
 
2.8%
24338
 
2.6%
41223
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common168268
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
041568
24.7%
836508
21.7%
722393
13.3%
122051
13.1%
515336
 
9.1%
912800
 
7.6%
67274
 
4.3%
34777
 
2.8%
24338
 
2.6%
41223
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII168268
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
041568
24.7%
836508
21.7%
722393
13.3%
122051
13.1%
515336
 
9.1%
912800
 
7.6%
67274
 
4.3%
34777
 
2.8%
24338
 
2.6%
41223
 
0.7%

rowNumber
Real number (ℝ≥0)

UNIQUE

Distinct62642
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41694.72373
Minimum1
Maximum83875
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size489.5 KiB
2022-05-06T08:38:47.034370image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3629.1
Q120069.25
median42019
Q363021.75
95-th percentile79529.95
Maximum83875
Range83874
Interquartile range (IQR)42952.5

Descriptive statistics

Standard deviation24488.86588
Coefficient of variation (CV)0.587337286
Kurtosis-1.225414232
Mean41694.72373
Median Absolute Deviation (MAD)21449.5
Skewness-0.01475390822
Sum2611840884
Variance599704552
MonotonicityStrictly increasing
2022-05-06T08:38:47.143751image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
< 0.1%
561891
 
< 0.1%
561911
 
< 0.1%
561921
 
< 0.1%
561941
 
< 0.1%
561951
 
< 0.1%
561961
 
< 0.1%
561971
 
< 0.1%
561981
 
< 0.1%
561991
 
< 0.1%
Other values (62632)62632
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
91
< 0.1%
101
< 0.1%
111
< 0.1%
121
< 0.1%
ValueCountFrequency (%)
838751
< 0.1%
838741
< 0.1%
838721
< 0.1%
838711
< 0.1%
838701
< 0.1%
838671
< 0.1%
838651
< 0.1%
838631
< 0.1%
838611
< 0.1%
838581
< 0.1%

Interactions

2022-05-06T08:38:33.875371image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:09.029787image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:17.445671image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:25.826490image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:34.463291image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:42.522534image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:51.006741image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:59.289362image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:34.009391image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:09.155145image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:17.564757image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:25.950512image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:34.581492image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:42.642290image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:51.131427image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:03.220405image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:34.129424image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:09.269630image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:17.669655image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:26.070545image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:34.687962image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:42.750627image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:51.243633image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:06.412954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:35.410721image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:09.383917image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:17.778687image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:26.186577image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:34.791506image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:42.857946image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:51.367699image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:09.614555image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:35.527745image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:09.497931image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:17.881267image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:26.302154image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:34.897845image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:42.970170image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:51.502747image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:12.958288image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:35.652774image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:09.619427image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:18.008579image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:26.415249image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:35.046252image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:43.086075image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:51.641778image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:16.669505image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:35.807808image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:09.761414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:18.146238image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:26.557715image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:35.184730image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:43.225180image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:51.797813image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:19.943032image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:43.347028image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:17.331262image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:25.700462image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:34.360107image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:42.417949image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:50.892195image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:37:59.158060image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-06T08:38:30.543351image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-05-06T08:38:47.253132image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-06T08:38:47.388437image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-06T08:38:47.518466image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-06T08:38:47.653502image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-06T08:38:47.750522image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-06T08:38:43.528689image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-06T08:38:43.860940image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

timestampcompanyleveltitletotalyearlycompensationlocationyearsofexperienceyearsatcompanytagbasesalarystockgrantvaluebonusgenderotherdetailscityiddmaidrowNumber
02017-06-07 11:33:27OracleL3Product Manager186Redwood City, CA1116017973928071
12017-06-10 17:11:29eBaySE 2Software Engineer0San Francisco, CA3400074198072
22017-06-11 14:53:57AmazonL7Product Manager126Seattle, WA10012600115278193
32017-06-14 21:22:25Microsoft64Software Engineering Manager171348Redmond, WA1313927995537123178115218195
42017-06-16 10:44:01AmazonL5Software Engineer214175Vancouver, BC, Canada131156335057840132006
52017-06-17 00:23:14AppleM1Software Engineering Manager423Sunnyvale, CA661572194774728077
62017-06-20 10:58:51Microsoft60Software Engineer0Mountain View, CA7200073228079
72017-06-20 18:49:59AmazonL5Software Engineer153939Seattle, WA331018285211101152781910
82017-06-21 17:27:47Microsoft63Software Engineer0Seattle, WA7110001152781911
92017-06-22 12:37:51Microsoft65Software Engineering Manager302Redmond, WA161319261491152181912

Last rows

timestampcompanyleveltitletotalyearlycompensationlocationyearsofexperienceyearsatcompanytagbasesalarystockgrantvaluebonusgenderotherdetailscityiddmaidrowNumber
626322021-08-17 05:57:34RefinitivN/AProduct Manager125New York, NY43Product97028schwocy prieurn1018250183858
626332021-08-17 06:47:07NCR10Software Engineer113Atlanta, GA32Full Stack11300Maleleerf suorth kroesch vaapt hypiups symbioub hypuny saiong physiosts dynaiads sciil scroably treagy fiaold783952483861
626342021-08-17 07:15:05DeloitteManagerProduct Manager230Washington, DC50Product190040dweiass plaany typuatts schriiks spriell swoals syniefy mcgeauv claury graond preath dynuentz slauecks4030351183863
626352021-08-17 07:50:25Raytheon TechnologiesP2Software Engineer125Portsmouth, RI42Distributed Systems (Back-End)12203Maleszes skauns frairt struisly krauerr pfurf bloold bluesh sqiofy3538252183865
626362021-08-17 07:55:47AT&TPrincipalData Scientist229Atlanta, GA74general214015keuny twiintz scriitt scoiaz rhaub soob wheiapp spof xeult schotz783952483867
626372021-08-17 08:16:36AmazonL6Product Manager272Seattle, WA70Analytic1765145Femalescrielt schwists fuech xauem typaiarty siids qiaongs throiak dynound twaiarty hypeuld1152781983870
626382021-08-17 08:22:17Fidelity InvestmentsL3Software Engineer50Durham, NC00Full Stack4307Malepauss jiusts liaop rhiitts proing smiodly spluiel clids spiun knaiarth cycleorn theops960656083871
626392021-08-17 08:24:56CiscoGrade 8Software Engineer200San Jose, CA36Networking179714Maletriasm froidy peusk juieck skaorr742280783872
626402021-08-17 08:26:21HSBCGCB5Software Engineer86New York, NY105Full Stack72014hypoiarly khoiarts krol sqierg1018250183874
626412021-08-17 08:28:57AdobeSoftware Engineer 5Product Designer382San Francisco, CA70User Experience (UX)23013913Malefluingly toasly croesm fiuk gniosk khaiarts strucy741980783875